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Abstract 

In this paper, we propose a proof-carrying code framework for program-generators. The en- 
abling technique is abstract parsing, a static string analysis technique, which is used as a component 
for generating and validating certificates. Our framework provides an efficient solution for certify- 
ing program-generators whose safety properties are expressed in terms of the grammar representing 
the generated program. The fixed-point solution of the analysis is generated and attached with the 
program-generator on the code producer side. The consumer receives the code with a fixed-point 
solution and validates that the received fixed point is indeed a fixed point of the received code. This 
validation can be done in a single pass. 


1 Introduction 


To certify the safety of a mobile program-generator, we need to ensure not only the safe execution of 
the generator itself but also that of the generated programs. Safety properties of the generated programs 
are specified efficiently in terms of the grammar representing the generated programs. For instance, the 
safety property “generated programs should not have nested loops” can be specified and verified by the 
reference grammar for the generated programs. 

Recently, Doh, Kim, and Schmidt presented a powerful static string analysis technique called abstract 
parsing I®. Using LR parsing as a component, abstract parsing analyzes the program and determines 
whether the strings generated in the program conform to the given grammar or not. 

In this paper, we propose a Proof-Carrying Code (PCC) framework {H 0 for program-generators. 
We adapt abstract parsing to check the generated programs of the program-generators. With the gram- 
mar specifying the safety property of the generated programs, the code producer abstract- parses the 
program-generator and computes a fixed-point solution as a certificate. The code producer sends the 
program-generator with the computed fixed-point solution. The code consumer receives the program- 
generator accompanied with the fixed-point solution and validates that the received fixed point is indeed 
the solution for the received program-generator. Our framework can be seen as an abstraction-carrying 
code framework ||T] [5l specialized to program-generators which is modeled by a two-staged language 
with concatenation. 

This work is, to our knowledge, the first to present a proof-carrying code framework that certifies 
grammatical properties of the generated programs. Directly computing the parse stack information as a 
form of the fixed-point solution, abstract parsing provides an efficient way to validate the certificates on 
the code consumer side. In contrast to abstract parsing, the previous static string analysis techniques (3] 
El 13 approximate the possible values of a string expression of the program with a grammar and see 
whether the approximated grammar is included in the reference grammar. This grammar inclusion check 
takes too much time and makes those techniques difficult to be used as a validation component of a PCC 
framework. 
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2 Language 

For the further development of our idea, we consider a two-staged language with concatenation in which 
program-generators can be modeled. The language is an imaginary, first-order language whose only 
value is code. The language is minimal, so as not to distract our focus on static analysis. For exam- 
ple, loops and conditional jumps are without the condition expression, for which abstract interpretation 
anyway considers all iterations and all branches. 

A program is an expression e\ 

e € Exp ::=x \ letxe\ e2 \ orei ei \ rex^i e^e-i \ ‘ f 
An expression can contain code fragments /: 

/ £ Frag ::= x | let | or | re | ( | ) | /1./2 | ,e 

Operational semantics of the language is defined in Figure |3](left). 

Expression or e\ e.j is for branches. It could be the value of e\ or the value of <??■ Expression 
rexei ej ej, is for loops. Variable x has the value of e\ as its initial value. Loop body ei is iterated > 0 
times. The result of each iteration ei will be bound to x in ei for next iteration or in £3 for the result of 
the loop. Backquote form ‘ f is for code fragment /. We construct the fragment by using the following 
tokens: variables, let, or, re, (, and ). Compound fragment /1./2 concatenates two code fragments f\ 
and /2. Comma fragment ,e first evaluates e then substitutes its result code value for itself. Note that the 
meaning of ‘ f and ,e is the same as in LISP’s quasi-quotation system. 


3 Abstract Parsing 

In our framework, we use abstract parsing d as a component to generate and validate the certificate. 
Abstract parsing derives data-flow equations from the program and solves them in the parsing domain. 
In 0, we formulated abstract parsing in the abstract interpretation framework. 

The key idea of abstract parsing is an abstraction of code. Code c is abstracted into a parse-stack 
transition function / = Xp.parse(p, c ) where parse is a parsing function defined by an LR parser genera- 
tor with the safety grammar G. This choice of abstraction is necessary to handle code concatenation x.y. 
If abstracted functions for the code fragments x and y are f x = X p.parse(p.x) and f y = Xp.parse(p.y) 
respectively, an abstracted function for the code concatenation x.y is constructed by function composition 
of f x and f y as f x , y = f y o f x . 

As illustrated in Figure [T] we take a series of abstraction steps for the value domain of the semantics. 


Concrete 

Collecting Parsing 

Semantics Semantics 


Parameterized 

First Step Abstract 

Abstraction Parsing 

Semantics Semantics 


2 Code 2 P^ P 2 P -► 2 P D# — > D i 


Figure 1: Series of abstraction steps for the value domain in semantics where P is the set of parse stacks. 

Starting from the collecting semantics defined in Figure [3] (middle), each abstraction of the value 
domain derives new abstract semantics. 
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To ensure the termination of the analysis, we need to provide an abstraction for the infinite height 
domain 2 P . Instead of using a particular abstract domain for 2 P , we parameterize this abstract domain by 
providing conditions which an abstract domain D : needs to satisfy. 

1. rf- should be a complete partial order (CPO). 

2. D~ is Galois connected with the set of parse stacks 2 P . 

3. An abstracted parsing function Parse _actior$ is defined as a sound approximation of the parsing 
function Parse ^action which is defined by the LR parser generator with the safety grammar G. 

Finally, we derive the abstract parsing semantics for D z as in Figure [ 3 ] (right). 

Given a program-generator e and an empty environment Ob, the analysis computes F = HJ, Co 
which is of type D : — ► DK To determine whether the programs generated by a program-generator e 
conform to the safety grammar, we check that the following equation holds: 

■F(^2 p _>D# ({.Pinit})) = ({Pace}) 

where and p acc are the initial parse stack and accepting parse stack for the safety grammar G. 

4 PCC Framework for Program-Generators 

Figure[2]illustrates a PCC framework for program-generators, an abstraction-carrying code framework [QQ 
0 specialized to program-generators by means of abstract parsing. The code producer and code con- 
sumers share the safety grammar which specifies the safety properties of the generated programs. 



Figure 2: A proof-carrying code framework for program-generators. 

The code producer proves the safety of the program-generator by abstract parsing with the shared 
safety grammar. In a complex and iterative process, the analysis computes a fixed-point solution. This 
solution is used as a certificate for the safety of the program-generator. The code producer uploads or 
sends the program-generator with the computed fixed-point solution. 

The code consumer downloads or receives the untrusted program-generator and its attached fixed- 
point solution. The code consumer validates that the received fixed-point solution is indeed a fixed-point 
solution of the received program-generator. In contrast to the computing a fixed-point solution on the 
code producer side, checking can be done in a single pass. 
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5 Issues 

The proposed framework addresses two fundamental PCC issues. 

1. The certificate, a fixed-point solution for the program-generator, is generated automatically by 
abstract parsing. 

2. Checking procedure on the code consumer side is done efficiently by validating the received fixed- 
point solution. 

However, we have several issues for further investigation. 

1. Size of the certificate: We are not sure that the size of the fixed-point solution which our framework 
generates is small enough for the mobile platform. However, there are some ideas on reducing the 
size of certificates. First, the certificate can be compressed. Abstract parsing uses an abstract parse 
stack as a component of the value domain. Since a parse stack is a string of characters from a 
pre-defined finite alphabet, an appropriate compression algorithm can be used to reduce the size of 
fixed-point solution. Second, some parts of the certificate could be deleted as long as their recovery 
takes linear time to the size of the received code. 

2. Size of the trust base: Similar to other abstraction-carrying code frameworks, the certificate checker 
of our framework is almost as complex as the certificate generator. It is essential to simplify the 
certificate checker to reduce the size of the trust base. 


References 

[1] E. Albeit, G. Puebla, and M. Hermenegildo. Abstract interpretation-based approach to mobile code safety. In 
Proceedings of Compiler Optimization meets Compiler Verification, 2004. 

[2] Tae-Hyoung Choi, Oukseh Lee, Hyunha Kim, and Kyung-Goo Doh. A practical string analyzer by the widen- 
ing approach. In Proceedings of the Asian Symposium on Programming Languages and Systems, volume 4729 
of Lecture Notes in Computer Science, pages 374-388, Sydney, Austrailia, November 2006. Springer- Verlag. 

[3] Aske Simon Christensen, Anders Mealier, and Michael I. Schwartzbach. Precise analysis of string expressions. 
In Proceedings of the Static Analysis Symposium, pages 1-18. Springer- Verlag, 2003. 

[4] Kyung-Goo Doh, Hyunha Kim, and David Schmidt. Abstract parsing: static analysis of dynamically generated 
string output using LR-parsing technology. In Proceeeding of the International Static Analysis Symposium, 
2009. Available from http : //santos . cis . ksu . edu/schmidt/dohsas09 . pdf 

[5] Manuel V. Hermenegildo, Elvira Albert, Pedro Lopez-Garcfa, and German Puebla. Abstraction carrying code 
and resource-awareness. In Proceedings of the ACM SIGPLAN International Conference on Principles and 
Practice of Declarative Programming, pages 1-11, New York, NY, USA, 2005. ACM. 

[6] Soonho Kong, Wontae Choi, and Kwangkeun Yi. Abstract parsing for two-staged languages with concatena- 
tion. In Proceeeding of the International Conference on Generative Programming and Component Engineer- 
ing, 2009. Available from http : //ropas . snu . ac . kr/~soon/ paper /gpce09 . pdf 

[7] Yasuhiko Minamide. Static approximation of dynamically generated web pages. In Proceedings of the Inter- 
national Conference on World Wide Web, pages 432^141, New York, NY, USA, 2005. ACM. 

[8] George C. Necula. Proof-carrying code. In Proceedings of The ACM SIGPLAN-SIGACT Symposium on 
Principles of Programming Languages, pages 106-1 19, New York, NY, USA, 1997. ACM. 

[9] George C. Necula and Peter Lee. The design and implementation of a certifying compiler. In Proceedings 
of the SIGPLAN Conference on Programming Language Design and Implementation, pages 333-344, New 
York, NY, USA, 1998. ACM. 


21 



PCC Framework for Program-Generators 


Kong, Choi, and Yi 


M M M M M M M 


m m || 


■ q © 00^0(3050 

q q q q q 

II II II II II 

*^7 « ^ ^ F ^ q ^ 

i=3 ^ £=■ ^ ^=, ^ 

( sT“ r' 7 * 130 cjo 130 ^ 

Q ^ 'IT' q -T" 

^ 3^ \z S, 





